Glr* : a Robust Parser for Spontaneously Spoken Language
نویسنده
چکیده
This paper describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise in the input, and limited grammar coverage. GLR* attempts to overcome these forms of extra-grammaticality by ignoring the unparsable words and fragments and conducting a search for the maximal subset of the original input that is covered by the grammar. The parser is coupled with a beam search heuristic, that limits the combinations of skipped words considered by the parser, and ensures that the parser will operate within feasible time and space bounds. The developed parsing system includes several tools designed to address the diiculties of parsing spontaneous speech: a statistical disambiguation module, an integrated heuristic for evaluating and ranking the parses produced by the parser, and a parse quality heuristic, that allows the parser to self-judge the quality of the parse chosen as best. To evaluate its suitability to parsing spontaneous speech, the GLR* parser was integrated into the JANUS speech translation system. Our evaluations on both transcribed and speech recognized input have indicated that the version of the system that uses GLR* produces about 30% more acceptable translations, than a corresponding version that uses the original non-robust GLR parser.
منابع مشابه
JANUS: a Multi-lingual Speech-to-speech Translation System for Spontaneously Spoken Language in a Limited Domain
Janus is a multilingual speech translation system currently operating in the domain of meeting scheduling. Translating spontaneous speech requires a high degree of robustness to overcome the dissuencies of spoken language as well as errors in speech recognition. In this system description, we focus on the robust speech translation components in Janus|the skipping GLR* parser, the segmentation o...
متن کاملGLR* : A Robust Grammar-Focused Parser for Spontaneously Spoken Language
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. ...
متن کاملPROFER: predictive, robust finite-state parsing for spoken language
The natural languageprocessingcomponentof a speechunderstanding system is commonly a robust, semantic parser, implemented as either a chart-based transition network, or as a generalized leftright (GLR) parser. In contrast, we are developing a robust, semantic parser that is a single, predictive finite-state machine. Our approach is motivated by our belief that such a finite-state parser can ult...
متن کاملMulti-lingual Translation of Spontaneously Spoken Language in a Limited Domain
JANUS is a multi-lingual speech-tospeech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. In an attempt to achieve both robustness and translation accuracy we use two di erent translation components: the GLR module, designed to be more accurate, and the Phoenix module, designed to be more robust. We analyze th...
متن کاملComparative Study of GLR Parser with Finite-state Predictors and Chart-based Semantic Parsers
The natural language processing component of a speech understanding system is commonly a robust, semantic parser, implemented as either a chart-based transition network, or as a generalized left right (GLR) parser. In contrast, we are developing a robust, semantic parser that is a single, predictive finite-state machine. Our approach is motivated by our belief that such a finite-state parser ca...
متن کامل